Bilingual Multi-Word Lexicon Construction via a Pivot Language

نویسندگان

  • Hyeong-Won Seo
  • Hong-Seok Kwon
  • Min-ah Cheon
  • Jae-Hoon Kim
چکیده

Bilingual multi-word lexicons are helpful for statistical machine translation systems to improve their performance. In this paper we present a method for constructing such lexicons in a resource-poor language pair such as Korean-French. By using two parallel corpora sharing one pivot language we can easily construct such lexicons without any external language resource like a seed dictionary. The experimental results for the KR to FR have shown that the accuracy is quite promising, even though this research is ongoing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Building Bilingual Lexicons using Lexical Translation Probabilities via Pivot Languages

This paper proposes a method of increasing the size of a bilingual lexicon obtained from two other bilingual lexicons via a pivot language. When we apply this approach, there are two main challenges, ambiguity and mismatch of terms; we target the latter problem by improving the utilization ratio of the bilingual lexicons. Given two bilingual lexicons between language pairs Lf –Lp and Lp–Le, we ...

متن کامل

Bilingual Lexicon Extraction via Pivot Language and Word Alignment Tool

This paper presents a simple and effective method for automatic bilingual lexicon extraction from less-known language pairs. To do this, we bring in a bridge language named the pivot language and adopt information retrieval techniques combined with natural language processing techniques. Moreover, we use a freely available word aligner: Anymalign (Lardilleux et al., 2011) for constructing conte...

متن کامل

Using Bilingual Segments in Generating Word-to-word Translations

We defend that bilingual lexicons automatically extracted from parallel corpora, whose entries have been meanwhile validated by linguists and classified as correct or incorrect, should constitute a specific parallel corpora. And, in this paper, we propose to use word-to-word translations to learn morph-units (comprising of bilingual stems and suffixes) from those bilingual lexicons for two lang...

متن کامل

Evaluating a Pivot-Based Approach for Bilingual Lexicon Extraction

A pivot-based approach for bilingual lexicon extraction is based on the similarity of context vectors represented by words in a pivot language like English. In this paper, in order to show validity and usability of the pivot-based approach, we evaluate the approach in company with two different methods for estimating context vectors: one estimates them from two parallel corpora based on word as...

متن کامل

Bilingual Lexicon Generation Using Non-Aligned Signatures

Bilingual lexicons are fundamental resources. Modern automated lexicon generation methods usually require parallel corpora, which are not available for most language pairs. Lexicons can be generated using non-parallel corpora or a pivot language, but such lexicons are noisy. We present an algorithm for generating a high quality lexicon from a noisy one, which only requires an independent corpus...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014